Skip to main content

TAIBOM Datapack

Introduction

The TAIBOM Datapack Schema is a JSON schema designed to describe and standardise collections of datasets within the Trusted AI BOM (TAIBOM) ecosystem. This schema ensures that groups of datasets, whether used for raw data, training, or testing, are organised and verified for integrity.

Description

This schema captures essential metadata for datapacks, including:

  • Name: A unique identifier for the datapack.
  • Datasets: A collection of dataset objects, each containing:
    • A unique identifier (`id`) for traceability.
    • A cryptographic hash (`hash`) to ensure data integrity.

Use Case

The TAIBOM Datapack Schema is primarily used within the TAIBOM framework to:

  1. Organise Dataset Collections: Provide a standardised format for grouping datasets used in AI workflows.
  2. Enable Traceability: Ensure that each dataset in the collection is uniquely identified and linked to its source.
  3. Streamline Dataset Verification: Facilitate verification and management of multiple datasets within a single datapack.

By adopting this schema, organisations can effectively manage and document groups of datasets, ensuring integrity and ease of use in AI system development and deployment.


Schemas

$id: https://github.com/nqminds/Trusted-AI-BOM/blob/main/packages/schemas/src/taibom-schemas/20-data-pack.v1.0.0.schema.yaml
$schema: https://json-schema.org/draft/2019-09/schema
title: TAIBOM Datapack
description: |
A datapack schema that describes a collection of datasets used for raw data, training data, and testing data. Each collection has a unique identifier and a cryptographic hash for verifying data integrity.
type: object
properties:
name:
type: string
description: The name of the datapack.
datasets:
type: array
items:
type: object
properties:
id:
type: string
description: The unique identifier of the dataset, such as a VC ID or a DID.
format: uri
hash:
type: string
description: The cryptographic hash (e.g., SHA-256) of the dataset for data integrity.
required:
- id
- hash
description: |
A list of dataset objects, each containing an ID and a cryptographic hash.
minItems: 1
uniqueItems: true
required:
- name
- datasets

Examples

namedatasets
Raw collection[object Object],[object Object]
Edit this schema here